A Language Model for Parsing Very Long Chinese Sentences
نویسنده
چکیده
B y corpus analyses, about seventy-five percent of Chinese sentences are composed of more than two sentence segments separated by commas or semicolons. A segment may be a sentence, a noun phrase, a verb phrase, an adjective phrase, an adverbial phrase, or a prepositional phrase. An N P segment may serve as a subject of the next segment or an object of the previous segment. The empty category pro may also appear in the VI' segment. The maximal freedom of the uses of pros, the large number of segments, the various segment types, and the associativity problem make sentence parsing difficult. Few parsing systems deal with these problems. This paper regards a segment as a basic parsing unit. And it uses characteristic words, subcategories of verbs, topic chain and some heuristic rules to link the segments into meaningful units. The pro resolution and the segment linking are useful for practical applications.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملSegmentation of Chinese Long Sentences Using Commas
The comma is the most common form of punctuation. As such, it may have the greatest effect on the syntactic analysis of a sentence. As an isolate language, Chinese sentences have fewer cues for parsing. The clues for segmentation of a long Chinese sentence are even fewer. However, the average frequency of comma usage in Chinese is higher than other languages. The comma plays an important role i...
متن کاملAn Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences
Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsi...
متن کاملA Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences
(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China) Abstract: Based on the analysis of the usage and the syntactic function of Chinese punctuations, this paper proposes a new hierarchical approach to parsing the long Chinese sentences. In traditional parsing approaches, the parsing procedure is performed on one-level and the ...
متن کاملA Model for Robust Chinese Parser
The Chinese language has many special characteristics which are substantially different from western languages, causing conventional methods of language processing to fail on Chinese. For example, Chinese sentences are composed of strings of characters without word boundaries that are marked by spaces. Therefore, word segmentation and unknown word identification techniques must be used in order...
متن کاملSystematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation
Francisco Oliveira, Fai Wong and Iok-Sai Hong. Systematic Processing of Long Sentences in Rule based Portuguese-Chinese Machine Translation The translation quality and parsing efficiency are often disappointed when Rule based Machine Translation systems deal with long sentences. Due to the complicated syntactic structure of the language, many ambiguous parse trees can be generated during the tr...
متن کامل